Automatic Lemmatisation of Lithuanian MWEs
نویسندگان
چکیده
This article presents a study of lemmatisation of flexible multiword expressions in Lithuanian. An approach based on syntactic analysis designed for multiword term lemmatisation was adapted for a broader range of MWEs taken from the Dictionary of Lithuanian Nominal Phrases. In the present analysis, the main lemmatisation errors are identified and some improvements are proposed. It shows that automatic lemmatisation can be improved by taking into account the whole set of grammatical forms for each MWE. It would allow selecting the optimal grammatical form for lemmatisation and identifying some grammatical restrictions.
منابع مشابه
Automatic Inference of Base Forms for Multiword Terms in Lithuanian
This paper reports on a specific problem of automatic terminology extraction in Lithuanian – base form inference. While the process of lemmatisation is properly carried out by existing tools, problems arise with normalizing multiword terms. It can be described as the discrepancy between the base form (i. e. lemma) of a term and the sequence of the base forms of constituent lexical items within ...
متن کاملCzech MWE Database
In this paper we deal with a recently developed large Czech MWE database containing at the moment 160 000 MWEs (treated as lexical units). It was compiled from various resources such as encyclopedias and dictionaries, public databases of proper names and toponyms, collocations obtained from Czech WordNet, lists of botanical and zoological terms and others. We describe the structure of the datab...
متن کاملExploring Features for Named Entity Recognition in Lithuanian Text Corpus
Despite the existence of effective methods that solve named entity recognition tasks for such widely used languages as English, there is no clear answer which methods are the most suitable for languages that are substantially different. In this paper we attempt to solve a named entity recognition task for Lithuanian, using a supervised machine learning approach and exploring different sets of f...
متن کاملParsing Modern Greek verb MWEs with LFG/XLE grammars
We report on the first, still on-going effort to integrate verb MWEs in an LFG grammar of Modern Greek (MG). Text is lemmatized and tagged with the ILSP FBT Tagger and is fed to a MWE filter that marks Words_With_Spaces in MWEs. The output is then formatted to feed an LFG/XLE grammar that has been developed independently. So far we have identified and classified about 2500 MWEs, and have proces...
متن کاملA Lexical Resource of Hebrew Verb-Noun Multi-Word Expressions
A verb-noun Multi-Word Expression (MWE) is a combination of a verb and a noun with or without other words, in which the combination has a meaning different from the meaning of the words considered separately. In this paper, we present a new lexical resource of Hebrew Verb-Noun MWEs (VN-MWEs). The VN-MWEs of this resource were manually collected and annotated from five different web resources. I...
متن کامل